Udacity Data Analysis Nanodegree¶

Project 5: Communicate data findings¶


Predicting flight delays¶

by Juanita Smith¶

Investigation Overview

Have you ever been stuck in an airport because your flight was delayed and wondered if you could have predicted it if you'd had more data? This is our chance to find out!

In this investigation, I wanted to explore which characteristics predict delays the most. Several avenues were explored: seasonality patterns, airport/carrier relationships, distance (short haul vs long haul flights) and multiple lane repetitions per day.

Key insights:
  1. Setting the scene by showing distribution of flights - 21% of flights are delayed
  2. Seasonal peaks are the main reason for delays
  3. Carrier and Airport Analysis: All airports and carriers will have delays during seasonal peaks, however certain airport and carrier relationships have delays above average
  4. What are the reasons for higher than average delays for certain airport/carrier relationships ?


Dataset Overview

This dataset reports flights in the United States, including carriers, arrival and departure delays, with reasons for delays, from 1987 to 2008. Due to large data volume, only years 2003 - 2007 will analysed in this project.

1. Distribution of flights

How many flights gets delayed or cancelled ?
Conclusion:
Around 77% of flights are on time, where as 21% are delayed. Only 2% of flights are cancelled or diverted.

2. Seasonal peaks are the main reason for delays

When is the best time of year/day of week/time of day to fly to minimise delays?

2a. Seasonal Analysis: Trend and Time series using statmodels

Conclusion:
  • There is an upwards trend in delayed flights year upon year.
  • There is a strong seasonal pattern re-occuring every year. The trend is removed from the time series and the average of this de-trended series for each period is the returned seasonal component

2b. Seasonal Analysis: Monthly Distribution

In the first plot on the left in blue, which focus on the distribution of delayed flights, there are 2 strong peaks visible:
  • Christmas period in December - March
  • During summer months June - August


In the second plot on the right in grey, which plot the distribution of all flights, there is not much variation in the total flights per month even in the peak periods.
Conclusion: Airports are more busy during peak periods, even though number of flights do not increase. This might be due to an increase in passengers in airports on fully booked flights.

2c. Seasonal Analysis: Weekday Distribution


In the first plot on the left in blue, which focus on the distribution of delayed flights, delays are more likely to happen on Mondays, Thursdays and Fridays.

In the second plot on the right in grey, which plot the distribution of all flights, weekdays have the same number of flights, weekends are more quiet
Conclusion: - Most delays happens on weekdays on Mondays, Thursdays and Fridays. Tuesdays and Saturdays are the quietest days. Airports are less busy with less delays on weekends.

2d Seasonal Analysis: Time of day Distribution


In the first plot on the left in blue, which focus on the distribution of delayed flights, delays are more likely to happen on Mondays, Thursdays and Fridays.

In the second plot on the right in grey, which plot the distribution of all flights, weekdays have the same number of flights, weekends are more quiet.
Conclusion: Delays are growing progressively throughout the day starting at 6am until 8pm whereas the total number of flights stay constant. Delay peaks are between for 5-8pm. From 8pm, both the delays and number of flights decrease.

3. Carrier and Airport Analysis

3.1 Busiest Departure Airports

Conclusion:
International airports in Atlanta (ATL), Chicago (ORD) and Dallas (DFW) are the busiest airports in the US, covering 15% of all flights

3.2 Busiest Departure Airports - How many flights are delayed ?

International airports in Atlanta (ATL), Chicago (ORD) and Dallas (DFW) are the busiest airports in the US, covering 15% of all flights
Conclusion:
From a proportional perspective, John F Kennedy (JFK), Philadelpha (PHL), Newark (EWR) and Chicago (ORD) have over 40% of their flights delayed on departure

3.3 Busiest Carriers

Conclusion:
Southwest Airlines (WN), Americal Airlines (AA) and Delta Airlines (DL) are the biggest airlines in US, covering 33% of all flights

3.4 Busiest Carriers - How many flights are delayed ?

Southwest Airlines (WN), Americal Airlines (AA) and Delta Airlines (DL) are the biggest airlines in US, covering 33% of all flights
Conclusion:

Carriers with the most delayed flights from a proportional perspective:
  1. EV: Atlantic Southeast (36%)
  2. B6: Jet Blue (32%)
  3. YV: Mesa Airlines (31%)
  4. MQ: Americaln Eagle (31%)

3.5 Average delay by Carrier and the Departure Airport

All airports and carriers will have delays during seasonal peaks, however certain airports and carriers have delays above average
Conclusion:
  • The bottom right quadrant (problem quandrant) contains the carriers and airport combinations with the most average delays, the darker green/blue area is clearly distinguishable.
  • The top 3 biggest carriers (WN, AA, DL) have below average but consistant delays across all airports and appear in the top half of the heatmap
  • 3 of the businest departure airports (ATL, ORD, EWR) are positioned on the right of the heatplot. However these airports are not doing consistantly 'bad', carriers in the top half are still doing well at these airports.


The fact that the same carriers does well at airports in left quadrants, but worse at airports in the right quadrants, indicate there is something specific about the carrier/location relationship we should explore further

4. What are the reasons for higher than average delays in the problem quadrant ?

Why are certain carriers performing bad only at certain airports ?
Problem quadrant is highlighted in red

4.1 Reasons for delays

Why are certain carriers performing bad only at certain airports ?
The 4 biggest airlines (HP, DL, AA, WN) that does consistently well at all airports, are compared with the 8 worse performing carriers from the problem quadrant
(Same color coding from heatmap is used: yellow/light green represent good, darkgreen/blue present bad)
Conclusion:
Generally carriers on the right of the bar chart (from the problem quadrant) have above average delays across all delay reasons in multiple airports

4.2 Repetitive short-haul flights

Why are certain carriers performing bad only at certain airports ?
Could it be that these carriers do repetitive short haul flights, and a delay in one leg, cause delays for the rest of the day?

Conclusion:
Carriers in the problem quadrant (dark green/blue bars on the right) fly multiple but shorter distances <= 500 miles, were carriers on the left of the quandrant (yellow/light green bars on the left), do distances near or above 1000 miles

4.3 Average flight distance by carrier

Why are certain carriers performing bad only at certain airports ?
Conclusion:
Generally, carriers in the problem quadrant (last 8 bars on the right) fly distances around 500 miles, whilst carriers on the left fly longer distances

Summary

  • Around 77% of flights are on time, where as 21% of flights are delayed.
  • There is a strong seasonal pattern, the biggest one around Christmas time in December - March, and another one during summer months June - August. Airports are quieter in spring/autumn months.
  • Mondays, Thursdays and Fridays are the busiest times at airports, it is the most quiet on Tuesdays and Saturdays.
  • Delays grows progressively throughout the day, with most delays happening between 17:00-20:00. Fly early morning or late evening to avoid delays.
  • Airports are more busy during peak periods, even though number of flights do not increase. This might be due to an increase in passengers in airports on fully booked flights
  • Carriers that do repetitive short-haul flights per day with distances around 500 miles, will have more delays than carriers that does longer distances. Carriers flying longer distances have more time to catch-up delays during the journey.